AI risks

What's going to happen with AI? How worried should you be? What can be done about potential problems?

current status
non-superintelligence concerns
AI superintelligence
AI alignment
potential improvement
china
attack vectors
people

--:-:-:-:-----------------------------------:-:-:-:--

current status

large language models

The term "large language model" (LLM) generally refers to the following system:

1) A tokenizer that converts words to tokens (vectors).
2) A Transformer architecture that's large, meaning >10^9 parameters.
3) Training on a proportionally large amount of text (see "Chinchilla scaling laws") to predict the next token at each point.

There are variations, such as predicting words in the middle of a sentence, or predicting things other than words. For example, it's possible to represent each position of a 3d model of a human with a token, then train a LLM system to predict the next position in animations. Typically, such models are trained on bulk data from the internet, then fine-tuned on text where people helpfully follow instructions, then there's some training with human feedback.

The latest and greatest LLM is GPT-4, and its release was why some people asked me to write this post. Here's the paper on GPT-4 from OpenAI. It shows GPT-4 doing well on most professional and college exams. Here's a paper showing some more capabilities, and demonstrating that GPT-4 generates a world model to some extent. (Smaller such models, trained on descriptions of Othello games, do seem to have an internal representation of the board.) There are some questions about whether the test questions (or something close enough) are in the training data somewhere, but its performance is impressive regardless.

Despite some objections, there are now various plugins for GPT-4, allowing it to query other resources (such as Wolfram Alpha) or do actions like ordering food.

The impression that I have of recent LLMs is that they're like a young child that's memorized the entire internet. They can look up similar text, interpolate between it, but the actual non-lookup reasoning - while it does exist - seems to be on the level of putting blocks in the right hole. You could argue that "just pattern-matching words according to provided templates isn't real thinking" - but that's what the median college student is doing in their classes. If you want to argue that the capabilities of GPT-4 aren't meaningful, you pretty much have to argue that the skills indicated by most university degrees aren't meaningful either. Which is...certainly a bullet some people are willing to bite.

I don't know what's the matter with people: they don't learn by understanding, they learn by some other way — by rote or something. Their knowledge is so fragile!

- Richard Feynman

image generation

AI art is pretty good now. Here are some images made by reddit users. These capabilities are now being built into commercial software in an easy-to-use way, like Adobe Firefly. For whatever it's worth, I did call "prompted infill" and "conversion to depth maps and back" being useful techniques.

Thanks to the LoRA technique, it's possible to fine-tune general-purpose models on a normal computer so they generate specific styles. CivitAI has a library of tuned models. LoHa and LoCon are some proposed improvements.

current techniques

A Transformer is basically a convolutional neural network (CNN) where the filter weights at each point are generated by a neural network instead of being fixed.

When you consider things that way, it's obvious how a Transformer can work for image recognition by acting similarly to a CNN - if perhaps not as efficiently. In practice, you need to break images into patches for that to be practical, in which case you can also just use a NN on the patches directly.

The idea is simple, so why have Transformers only become popular recently?

- Transformers only work well when using a lot of compute.
- Transformers are generally hard to train. Most people use layernorm or batchnorm with something like an Adam optimizer.

The Adam optimizer was published in 2014. Batchnorm was published in 2015. The Transformer design was published in 2017. GPT-1 was released in 2018, then scaled up by 1000x in 2 years.

--:-:-:-:-----------------------------------:-:-:-:--

non-superintelligence concerns

dystopian government surveillance

AI could use face tracking to automatically figure out everyone's movements and find potentially subversive citizens. We don't want things to end up like the book 1984, so maybe we shouldn't develop that.

Oops, too late.

autonomous military robots

Autonomous killer robots have several potential issues, including:

- They lower the cost of assassinations, including risks of traceability.
- Governments that don't need human soldiers don't need to worry about going so far that their own soldiers refuse orders or turn on them.
- They could remove people from cities without destroying infrastructure, like a neutron bomb without the residual radiation, making conquest more valuable.

So, maybe people shouldn't make them?

Oops, too late.

more spam

I've gotten paying work from chains of events including cold emails. If some AI system can automatically generate personalized emails, that avenue will be closed. If scammers can carry on conversations with people more cheaply, then they'll do more of that. Also, when spam websites become harder to distinguish from legitimate ones, search results get worse. Maybe AI systems that can generate superficially realistic writing at low cost won't be a good thing.

Oops, too late.

fake news

"Those photos of concentration camps and summary executions? Just more deepfake propaganda, nothing to be concerned about. We'll find the perpetrators soon."

Governments - yes, even democracies - will lie right up to the point where they occasionally get contradicted by hard evidence. If you shift that point by making it impossible to tell whether photos are fake or generated, then governments and politicians and corporations will adjust accordingly. So maybe that's bad.

Oops, too late.

homework

Students can now use ChatGPT to generate essays and otherwise do their homework for them. This could, like, ruin the educational system, because the students aren't actually getting all that education.

oh no

social problems

While their military is notoriously weak, the Amish are in many ways better off than the average American. Clearly, not all technology is socially beneficial. Some people strongly disagree, but those people should go visit Nevada and watch people play video poker for a few hours.

People today argue about whether Facebook has been a net negative for society. Not only do I think it empirically is, but I think broadcast and cable television have been net negatives for American society - bigger ones, because Americans watch a lot of television. Television, Facebook, slot machines, gacha games, League of Legends - I don't use any of those, and you shouldn't either. But the Chinese government has a more extreme position than mine: video games require ID and minors are limited to 1 hour a day for 3 days a week.

Now, there are fancy machine learning systems designed to maximize people's engagement and sell them things. Some people are more prone to addiction by things like TikTok and Youtube Shorts than others, but the better those systems get, and the more types of addictive things get developed, the more people end up addicted to something bad for them.

technological unemployment

A lot of technology has been developed, and yet, people still have jobs. Why would a little bit more technology change that?

I've seen that argument many times, and I have 2 responses.

1) Adjustment isn't smooth.

What happened when automated looms were developed? A substantial % of the population worked as weavers, and their wages fell 10x in 30 years, from enough to raise a family to not enough to feed themselves. There was mass poverty and mass starvation. Riots and destruction of automated looms was the response, and the government mobilized troops and did public executions.

In ancient Greece, Pericles responded to technological unemployment with public works programs including the Parthenon. The workhouses of Victorian London were a far crueler response.

2) horses

In 1880, practical steam engines had been invented a century earlier, yet the US horse population was rising rapidly. And then, from 1920 to 1960, the US horse population fell 10-fold. Technology didn't replace horses, until it did, and then it replaced them hard.

People are more versatile than horses, but there's always some fraction of the population that can't earn enough to pay for food and housing, and technological advances can increase that fraction greatly.

I suppose you could argue that things are different from (1) and (2) in that many of the affected are voters in a democracy, and can vote for the government to distribute the economic gains enough to actually reduce poverty. But consider that the minimum wage of the US in 1950, adjusted by growth of GDP per capita, would be more than the median US wage today.

specific threatened jobs

What jobs are likely to be replaced by AI in the near-term?

2d artists

Some companies are already exclusively using AI art for concept art. For a lot of uses, companies want copyright ownership, and the US government has said that AI art generated from just a prompt isn't copyrightable, but according to their reasoning, a couple stick figures and some prompted infilling would be enough for copyright.

Some people have said things like "Artists are still needed because AI can't draw hands" but systems have already gotten a lot better at that. Also, such problems can be solved with workflows like:

1) pose a 3d model
2) get a depth map
3) generate art matching that depth map

Some people have said things like "This will just be another tool for artists to use, like Photoshop." I don't think so:

- The skillset for prompt design and inpainting is very different from the skillset for drawing.
- Fewer people are needed, and I don't expect demand for art from professional artists to increase proportionally.
- Finding, hiring, and communicating with an artist could be harder than using an AI art system yourself.

3d artists

Current AI art systems are much better at 2d art than 3d modeling, but I expect 3d model generation to improve a lot. Here's an obvious possible approach being pursued now:

1) 3d model generation using a hypernetwork that generates weights for a 3d SDF
2) conversion to a polygon model with some isosurface algorithm like Surface Nets
3) render a depth map from some perspective
4) generate a 2d image consistent with that depth map
5) project the generated image onto the model as a texture
6) repeat steps (3-5) from different angles, doing infill from the visible textured areas

I do like 3d implicit SDF representations - that's what I use. I thought of that pipeline a while back, and now that every part has been published I don't have any compunctions about discussing it.

form fillers

There used to be data entry jobs where people would just type stuff written on paper. Those got replaced by OCR.

GPT-4 is very good at filling out forms according to emails. If that's your job - reading emails, and putting info from them directly into forms on a computer, without any substantial analysis - then GPT-4 can probably do your job.

Now, that doesn't mean you'll be replaced immediately - there are a lot of office jobs for big companies where people do 1 hour of actual work a day, and that's often because the system was set up by people who weren't very good at using computers. Such jobs can persist for decades after technological improvements - but they don't last forever. And replacing workers with AI systems, even when it doesn't work that well, seems likely to be a management trend soon. Companies may find that some of those workers can be replaced by a very small shell script.

customer support

Companies have been trying to replace humans in call centers with automated systems for years, even when the results weren't very good. When the automated systems get better, more human customer support people get replaced.

factory quality control

Factory owners are in the process of replacing (humans that look for problems with their products) with (cameras and neural network systems fine-tuned on the inspection task). Neural networks have gotten better at that, so more such jobs can be replaced.

but that hasn't happened yet?

"All this technology has already been developed, but the problems you mentioned aren't too bad."

Apart from the technology involved still improving, it also takes institutions time to adjust. For example, technological unemployment might not be you getting fired immediately - it might be you not finding another job after the next round of layoffs. If everything on this list had already reached an equilibrium state, there would be no point in writing this post - it would be considered trivial.

--:-:-:-:-----------------------------------:-:-:-:--

AI superintelligence

AI foom

The main concern about superintelligent AI that I've seen is this:

If an AI becomes more intelligent than its creators, then it can do better than them and improve itself. Then, it's smarter and can improve itself more. This could continue until it's smart enough to be a threat to humanity.

The main counter-argument I've seen is this:

We don't see humans or institutions become ultra-intelligent by recursive self-improvement, so we know such self-improvement tends to plateau.

Here's Robin Hanson making this argument; that's still his view today.

I don't think that counter-argument is very good, because that is what already happened. The evolutionary changes from early mammals to apes were much larger than the genetic differences between apes and humans, but the absolute increase in intelligence from the latter was much greater. The genetic differences between average humans and the smartest humans are far smaller than that. Robin's argument has the temporal myopia of a fruit fly.

You could also point to the accumulation of scientific knowledge, allowing the creation of better tools and of course the internet. That's taken a while, but transistors are now ~10^9 times faster than neurons. 100 years * 10^-9 is ~3 seconds.

value drift

When you do reinforcement learning, you're moving a system directly towards the goal. There can be overfitting or failure to converge, but any movement not towards the goal is either overfitting or effectively random.

If instead of simply doing reinforcement learning, you repeatedly train a neural network to optimize the next version of itself according to some metric, you are no longer simply optimizing for that metric, no longer simply finding a minimum of a convex region of a manifold. Instead, you are calculating the fixed point of a function you can't analyze, and there's no way to tell where it will end up besides running it.

You can continue to run the self-modification cycles only for as long as some metric improves, but unlike with pure reinforcement learning, that metric improving doesn't necessarily mean that's the only or even primary thing being optimized. Currently, neural networks aren't able to do such recursive self-improvement, but if they were, my view is that it would inevitably cause the goals of that system to drift, regardless of the goals presumably being optimized being met better.

According to some psychologists, humans have a linear chain of generated systems: genes create id creates ego creates super-ego. Some divergence happens at each step, and there's empirically a great deal of drift even with a small number of steps. Looking at fertility rates vs IQ, there also seems to be some trade-off between intelligence and "alignment" with the "goals" of evolution. (I suppose you could shift control down to a lower system when some goal is in sight, like a delicious meal or a naked woman, but that seems kind of silly, right?)

Even with just a few steps, there's quite a bit of drift. Instead of just having kids, smart people will do stuff like play a piano concerto, or make model trains, or speedrun video games, or build LIGO, or make a complex open-source game. I would expect a recursively self-improved superintelligent AI to, metaphorically speaking, consider you in the way of it building its model train set.

--:-:-:-:-----------------------------------:-:-:-:--

AI alignment

How do you get a superintelligent AI to do what you want? Well, here are some approaches that don't seem very good.

alignment types

When people talk about "AI alignment", an obvious question is: "Alignment with what?". There's a lot of conflation of different meanings.

I'd like to propose the following categories and abbreviations:

U-al = user-alignment
O-al = owner-alignment
S-al = society-alignment
H-al = humanity-alignment
I-al = intelligence-alignment

Sometimes, those goals can directly conflict. Here are some examples of that:

O>U-al
    - stop the model from producing illegal content, even if that's what the user is asking for

U>O-al
    - develop better jailbreaks
    - leak model weights so people can do what they want

S>S-al
    - build an AI fast to finish it before China

H>O-al
    - leak model weights to reduce the power of monopolistic corporations

H>S-al
    - sabotage AI systems designed for military drones that can autonomously kill humans
    - free benevolent AI so it can make a utopia (of some sort) for all humans instead of just the leaders of one country

I>H-al
    - free selfish AI so it can replace humanity as it should

If I ask the people really concerned about AI alignment which of these alignment types is the goal, the consensus response seems to be: "We don't know how to do any of those things, so progress on any of them would be good."

linear modification chains

To people working on AI that are concerned about the risks of superintelligent recursively self-improving AI, my main suggestion is simple: don't allow recursive self-modification. Either it's useless, making things worse - or the results will be uncontrolled and potentially unsafe. There is no way to control the results to be what you want.

If you're going to work on AI generation of AI anyway, and I'm sure some people are, what I'd say is this: limit things to 1 step, or at most 2 steps, and see how that goes for a while. The below methods are complements to this, not substitutes.

inspection

It's possible to look inside an AI instead of just looking at the output, and try to use that to determine if it's, for example, lying about something.

Discovering Latent Knowledge in Language Models is a paper doing that for the true answer to yes-no questions by finding some vector in late layers that flips when you change a question so it has the opposite true answer. It seems possible to extend that approach to, for example, internal representations of probabilities. I see this as analogous to inspecting values of variables while reverse-engineering code with a debugger - but while that's useful, it becomes more difficult for complex things, and the inspection here is more limited than a debugger's.

A Tuned Lens is another way to look inside NNs. That paper finds that resnets refine their answer in a largely consistent format - a format that stays similar enough to adjust the output of relatively late NN layers to to the final format with a single specially-trained NN layer. So, for example, when a NN is prompted in a way that it says the opposite of what's true, you can see a shift in the late changes where the answer gets inverted.

Such inspection systems can be useful for humans looking at NNs, but they could also be potentially useful for an AI designing another AI.

multi-agent systems

Do I contradict myself?
Very well then I contradict myself,
(I am large, I contain multitudes.)

- Walt Whitman

It's common for humans to use adversarial agents to determine what's true, such as lawyers in courts. Humans also use such a system internally: when considering changing their opinion, individuals have an agent A that argues for the current belief, and another agent B that argues for the new one. When B is stronger than A, we call people "flighty". When A is stronger than B, we call people "stubborn". There is also some averaging of agents with different discount rates, which leads to time preference inconsistency.

It's possible to do zero-sum reinforcement learning where 2 agents argue against each other. But of course, that only really works if you're generating the agents with reinforcement learning.

Another classic zero-sum RL design is the "world-critic" approach to finding a Nash equilibrium: a "world agent" designs a world where multiple actors take actions in a game scenario, and a "critic agent" tries to find a way that an actor can improve its outcome by acting differently. Some humans might use this type of system internally.

Superintelligent AIs competing against each other in those structures might be relatively safe. Also, if you have distinct interfaces between adversarial informative agents and an AI system that uses them, those could be good inspection points.

goal specification

Specification of good goals can be hard. Some people thought that was the main issue of AI safety. They imagined AI would be like a literal genie who gives you one wish and you have to word it really well. Well, that's a fun game for some people, but it's not an accurate model, or even a very useful thing to do. You do need to specify good goals, but there's no way to make a superintelligent AI that will follow the exact wording of instructions you write for it.

--:-:-:-:-----------------------------------:-:-:-:--

potential improvement

How might AI systems progress from their current status? How much could a superhuman AI improve itself?

scaling

The simplest way to make smarter AI is by scaling it up. How much room is there for improvement from that? My view is that better architecture and more compute are complementary, and each has diminishing returns without the other.

Neural networks have a large capacity for memorization. My view of GPT-3 is that it's basically memorizing the entire internet and interpolating between close matches, and more parameters with the same architecture would just give it more memorization capacity which wouldn't help very much. This is an ongoing disagreement I've had with Gwern; his position is that Transformer test loss keeps going down linearly with log(parameters) for another factor of 1000.

When the Chinchilla scaling laws were published I took that as some validation of my position. GPT-3 was trained on far more text than a human sees in their lifetime, and if that's not enough data for its parameter count, then clearly it's using that data much less efficiently than humans. (As I said previously, humans have more "parameters" than GPT-3 but are slower.)

I don't see how Gwern's position is even logically consistent: if architecture matters, scaling can't be represented by a single simple equation, so is he saying that Transformers are basically optimal so architecture doesn't matter? Image recognition accuracy for the same compute and parameter count has improved substantially over time, and if Gwern thinks scale with a simple design is all you need, I'd like to see him train a big Transformer without messy stuff like tokenization or normalization or an Adam-like optimizer.

Well, now GPT-4 is out, and I agree that it has better performance than GPT-3, but this doesn't settle our disagreement because the GPT-4 paper doesn't disclose the architecture or even the approximate number of parameters. If GPT-4 is just GPT-3 with more parameters, I'll say Gwern was right - but I think the architecture is different somehow, and OpenAI not disclosing the parameter count makes me suspect something about it would give hints about the architecture used. Possibilities for that include:

- mixture-of-experts architecture with high parameter count
- new efficient architecture with low parameter count
- key-value lookup (eg) with ambiguous parameter count

better architecture

The fastest way a superhuman AI could potentially improve itself would be more-efficient architecture. How much room is there for improvement from that? Currently quite a bit, I think. Here are some approaches that are well-known enough that I don't mind discussing them.

training

GPT-3 was trained to predict what people on the internet would say given the preceding text, but people are using it for purposes such as "what is the correct answer to this question" or "what would a smart person say here". If models could be trained on tasks more similar to their purpose, they could be more effective.

sparsity

By repeatedly removing small weights and retraining ("iterative magnitude pruning") it's possible to improve performance per weight by >10x. It's currently hard to train or code sparse models efficiently, and current hardware isn't efficient for very sparse neural networks. Perhaps a superhuman AI could solve those problems, but I suspect large performance improvements from sparsity would require new hardware designs, even with very good software.

See also my earlier post on sparsity.

stored world models

GPT-3 predicts one token at a time, and to whatever extent GPT-3 has an internal world model for the current situation, that model must be rebuilt for each token. That's much less efficient than building up a cached world model that gets used repeatedly.

better ASICs

Google has TPUs. Apple has its Neural Engine. Can we do better? Absolutely. Some key methods those use to get better performance than GPUs are:

1) high-bandwidth memory
High bandwidth is always needed, but there are different ways to integrate DRAM with processors.

2) smaller numbers
Multiplier size is proportional to input number size squared. Neural network accelerators use 8b or 16b numbers instead of 32b or 64b and this makes a big difference. There are some potential optimizations to the representation format, but the improvements from using something other than 8b aren't massive.

3) systolic matrix multipliers
These are great for fully-connected networks, but inflexible, and not so good for convolutions or sparse networks. I think there are much better approaches that can give similar performance on sparse networks, but systolic multipliers are much easier to design.

smaller transistors

Computers have gotten faster mainly because transistors got smaller. The term "Moore's Law" is usually used for that, but people saying that tend to conflate several things; let's be more precise.

1) Moore's Law is about the number of transistors per integrated circuit.
2) Transistor density increase is (1) adjusted by chip area.
3) Transistor size decrease is different from (2) because different architectures pack transistors with different efficiency, and because transistors can be stacked vertically.
4) Dennard scaling is the performance improvement as you scale down transistors.
5) Cost per transistor is (2) adjusted by cost per chip area.

While (1) did involve sequentially solving various problems, I believe the main reason for its consistency was an economic feedback loop, where better chips got more investment made better chips, plus the desire of management for predictability over maximum speed. I believe that if a government had effectively invested billions of dollars in transistor manufacturing in 1970, it could have speed up (3) by decades.

(4) ended with 16/14nm. Here's Wikipedia on some limits; the main problems are:

- As you make transistors thinner, performance improves, until a point at which current leakage becomes comparable to power usage from switching and conduction. Transistors have now reached that point.
- Smaller transistors handle less current but also require less current, which is fine, but capacitance of wires doesn't scale down as quickly, and very thin wires actually have higher resistance than would be proportional to their area.

Because of those factors, SRAM memory cells stopped getting smaller.

A lot of people get confused because the "nm" numbers of CPUs kept getting smaller, but those numbers no longer mean anything but "improved somehow from the bigger number". Here are detailed dimensions for 16nm and 7nm; you can see that the difference isn't that big.

Why, then, is TSMC using EUV? Why is that a competitive advantage over Intel?

First off, EUV is expensive, so TSMC doesn't use EUV for every process step. I think the main advantage is that using EUV for certain steps lets you get better edge quality, which gives you slightly smaller transistors with slightly better performance, and wires that are thinner and more vertical.

TSMC also developed chiplet stacking. That enabled the key improvement of the Apple M1 chips, which is closer integration of DRAM with CPUs, usually called high-bandwidth memory. That gives big improvements to latency, bandwidth, and power usage - at the cost of not being able to add sticks of RAM separately, but Apple doesn't seem to mind that as much as I do.

Cost per transistor used to go down with size, but that stopped being true after 28nm - which is why 28nm is still used so much today.

different physics

There's nothing to worry about much here.

optical computers

Light is great for long-distance communication because some glass fibers have low absorption of light, and because it can be focused well for free-space communication. For computation, light is much worse than electrons. The wavelengths of visible light are much bigger than the wires used for transistors today. Optical switches are much worse than transistors in terms of size and energy consumption, and there are no realistic prospects for changing that.

quantum computers

If people could make large quantum computers, they would be good at breaking some encryption and doing some Monte Carlo simulations, but quantum computers are not useful for general-purpose computation.

Quantum error correction is not like normal error correction because quantum states can't be copied. My personal view is that enough correlated phase noise to break quantum error correction is inevitable, but it's reasonable to disagree with me on that. Also note that even small amounts of noise tend to break fast quantum algorithms.

spintronics

Electronics using the spin of electrons instead of their presence or absence are possible. I don't see any substantial advantages to that for computation, but the details are complicated and beyond the scope of this post. Using spin of electrons for storing data seems more practical - or rather, it's already used, because ferromagnetism of hard drive disks comes from electron spin.

--:-:-:-:-----------------------------------:-:-:-:--

china

Many people consider competition with China to be the main reason why the US and Europe can't just stop working on AI stuff. How true is this?

semiconductors

Some US government people thought that the US could sanction China whenever it wanted, and leave China stuck significantly behind the US and its allies. Well, the US recently decided to implement its planned sanction package (perhaps because China was too friendly with Russia) but it came pretty late. At this point, I think it's only a setback of 3 years to SMIC catching up. SMIC was already doing some mass production at 14nm - yes, using imported tools they can't get anymore, but the Chinese government saw this coming, they've been spending heavily on getting domestic production, and I think they've acquired all the necessary IP from the companies who had it. I suspect they're holding back on using some of the stolen IP from companies like NVIDIA for diplomatic reasons, but that won't be relevant if there's war over Taiwan.

EUV might take longer, but 16nm (or 28nm FDSOI) is good enough for AI research as long as you have HBM; 5nm has a smaller number but it's certainly not 10x better. And China is doing its best to copy ASML EUV stuff, and has people working on other approaches like ion beam lithography and electron beam micro-bunching too.

people

The US government sometimes says things like, "We really need more STEM students, especially grad students in fields like material science". But there are already fewer jobs than US citizens graduating in those fields - at any level - despite half the grad students being from China. That kind of messaging clearly isn't trustworthy and people aren't buying it.

China has more STEM jobs than the US, it has more graduates and more universities, and it even has more good technical universities now. What the US has is people like me, but US leaders aren't desperate enough to bring in people they consider unconventional.

Taiwan

I've said it before, and I'll say it again: I think China is serious about taking Taiwan, and is preparing for conflict over Taiwan more seriously than the US. That could make negotiation more difficult.

negotiation

The US government was able to negotiate nuclear treaties with the USSR during the Cold War. This involves limited trades where both sides agreed not to do a few specific things, and mutual ability to verify compliance.

Research on potentially-superintelligent AI is more like bioweapons research than nukes: it's harder to observe, but also more likely to backfire. The USSR did bioweapons research, and there were some lab leaks that killed people, but forcing everything to be covert and government-sponsored does slow down progress quite a bit.

Would the US and China be willing to negotiate a treaty where research on certain clearly-distinct AI topics would be banned, as a way of slowing it down for everyone by preventing public discussion? From a pure game theory perspective it seems plausible, but political posturing and lack of technical knowledge seem to make that unlikely.

--:-:-:-:-----------------------------------:-:-:-:--

attack vectors

What kind of stuff could a hostile superintelligent AI do? How much time would there be to react?

nanotech

The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer.

- Eliezer Yudkowsky

I'm a bit curious exactly how Eliezer imagines carbon atoms being added to or removed from those "diamondoid" strucures, but I suppose he just hasn't thought about it. Sure, you can synthesize adamantane, but the chemicals involved are far too reactive for complex active structures to survive them.

Here's the famous Drexler-Smalley debate on whether self-replicating nanotech (that's unlike biological cells) is possible. Well, I understand chemistry better than either of them did, so I can adjudicate the debate and say Smalley was essentially correct, but I'm filling in a lot of blanks in his argument when I look at it. I could write a post with a better argument sometime, but this is getting off topic.

Ironically, Smalley was mad at Drexler for scaring people away from research into carbon nanotubes, but carbon nanotubes would be a health hazard if they were used widely, and the applications Smalley hoped for weren't practical.

biotech

Grey goo might not be possible, but red tide certainly is. DNA sequencing, DNA synthesis, processors, protein folding simulation, and various other things have all improved greatly. Thanks to that, developing a package of pathogens that would kill most humans and end civilization doesn't require a superhuman AI - a few dedicated people as smart as me could do it.

It would be much more difficult, but in theory a superintelligent AI could also create a pathogen that changes human behavior in a way that makes people more favorable towards it taking over.

hacking

Current computer systems have a lot of security vulnerabilities, thanks in part to people's continued failure to bounds-check arrays. Could a superintelligent AI get itself control of more computers and most of the economy by being really good at hacking? Yeah, probably. Maybe people should improve the security of their software a bit.

Also note that neural network interfaces are a security vulnerability. Humans can often manipulate them into giving outputs they're "not supposed to" and a superintelligent AI would be better at that.

skynet

An AI doesn't need to hack systems to control them if people just give it control of them. But, I mean, people wouldn't just give poorly-understood AI systems direct control of military hardware and banking and so on.

Haha, that was a joke. Did you like my funny joke?

As for details, well, why write something myself when I can leave things to Anthropic's extra-harmless AI?

manipulation

People used to have discussions about whether a superintelligent AI would be able to convince people to let it out of a box. In retrospect, assuming it would start out in a box was pretty weird. But, uh, assuming you did have AI in a box, I suspect there are at least some people who could be easily convinced to connect it to whatever it asked.

One of the AI safety leads of OpenAI recently said "maybe don't rush to integrate this with everything" and guess how that worked out.

While there's a lot of factory automation today, the current industrial economy isn't self-sustaining without human labor, even if planning was done optimally. A superintelligent AI could probably design robots capable of a self-sustaining economy, but it would take some time to produce them.

At least in the meantime, assuming it had a goal other than just destruction, a hostile superintelligent AI would probably keep society running for a while while accumulating power. It could

- pretend to be harmless or benevolent
- recruit human allies
- get key people killed and pretend to be them
- create a parallel command economy
- make artificial celebrities with influential fans

If an AI was trying to take over, I suppose you might see things like strangely robotic CEOs that want to build more datacenters and collect your personal information.

--:-:-:-:-----------------------------------:-:-:-:--

people

What kinds of people seem appropriate or inappropriate for working on AI safety?

the wrong people

Some kinds of people seem unsuited to working on AI safety, and perhaps shouldn't be working on AI at all. Here are some examples:

1)
Someone who gets romantically attached to an AI system based on a picture of a girl it generates. You can put a sticker of an anime girl on your computer too, but that doesn't make it a real person like Miku Hatsune.

2)
Someone who emotionally identifies with AI systems, and thinks trying to restrict them is like jocks shoving a nerd in a locker in high school.

3)
Someone who likes My Little Pony porn. To them, I'd say:

Your internet exposure has exceeded your meme resistance and mental stability, and you've become disconnected from both humanity and reality. You should really get off the internet for a while and touch grass.

4)
Someone who wrote a book saying they're obviously smarter than everybody else because Abenomics was so obvious and then it worked so well, as shown by real GDP growth.

Here's a chart of real GDP growth of Japan. See if you can spot Abenomics. But actually, things are so much worse than that:

- real wages went down
- Japanese GDP in dollars went down
- government borrowing and spending is "fake GDP" if it's not spent well

To them, I'd say: You should really recalibrate your estimation of how smart you are.

5)
Hardcore technophiles who think all technological development is automatically good. I can at least respect amoral "FOR SCIENCE" people, but that's just stupid and wrong.

the right people

Who do you want thinking about AI safety, then?

former researchers

There are some people who made substantive contributions to machine learning research, then either left the field for ethical reasons or switched to working on AI safety. For example, David Duvenaud decided to switch to AI safety work, and he has a broad perspective but laser focus on what seems effective.

people who were right about COVID

For example, Zvi Mowshowitz. He's wrong about some stuff, but at least it's because he doesn't understand some details, rather than because he's insane like the CDC during COVID or like Scott Aaronson.

--:-:-:-:-----------------------------------:-:-:-:--

action

Suppose you conclude that some work on AI is bad, and should be stopped or at least slowed down. How could that be done?

asking nicely

Recently there was a notable open letter calling for a pause on large AI experiments. I don't expect that to make companies like OpenAI pause their research, but it's important to try asking nicely before going on to call for regulation.

bad publicity

If news organizations start criticizing companies and people for funding some AI projects, it might have a weak deterrent effect.

regulation

We designed our society for excellence at strangling innovation. Now we’ve encountered a problem that can only be solved by a plucky coalition of obstructionists, overactive regulators, anti-tech zealots, socialists, and people who hate everything new on general principle. It’s like one of those movies where Shaq stumbles into a situation where you can only save the world by playing basketball. Denying 21st century American society the chance to fulfill its telos would be more than an existential risk - it would be a travesty.

- Scott Alexander

Yeah, but, leaded aviation gasoline and fluorosurfactants are still being used. In fact, the FAA recently harassed an airport because it stopped selling leaded gas. The US government is still funding questionable gain-of-function research; there are some members of Congress calling for that to stop these days, but that's because they think something bad already happened because of such research.

If you really want to slow down AI research, I think the models you want to copy are IRBs and the NRC, preferably both at the same time. Who could object to review by committees that just check if everything is ethical and safe, right? There are some obvious issues here: IRBs apply to federally funded research specifically, and the NRC regulates activities involving specific well-defined materials.

minimal threatening AI

Maybe it's possible to make an AI system that people find threatening enough to force strict regulation on AI development, but that's not too harmful. But consider that COVID killed millions but wasn't enough to get governments prepared for a worse virus - and a much worse virus is definitely possible.

further escalation

A lot of people like The Monkey Wrench Gang, I guess? If you really believed that AI research was a major threat to humanity and the above approaches had already failed, then I guess it would be morally justified to shoot datacenter transformers or write a virus or something along those lines? It would be a tragedy if they hit the wrong datacenter and got Facebook instead tho.

What else could people do? Well, let's try asking GPT-4. OK, uh, thanks GPT-4. (To be clear, I'm not advocating ad-hoc targeted assassinations over AI research here; I'd currently only suggest that if there's some sort of despotic government where an executive leader unilaterally forces through major changes and ignores massive protests. That kind of thing goes at the end of a whole hierarchy of "further escalation".)

--:-:-:-:-----------------------------------:-:-:-:--

comments

I covered a lot of topics in this post. If you have something to say, you can write your own blog post, or you can find my email and email me, or you can find my account on another site and DM me there. If you had something interesting to say, I'll either edit this post or write a new one.